How to fit an Ordinal Regression (unconditional/conditional) in
Bayes
Compare 2 groups
Ordinal responses are ordered; include Likert scales for agreement with attitude statements (e.g., disagree, neither agree nor disagree, and agree) and reported frequencies of doing something such as helping children with homework (e.g., daily, several times per week, occasionally, and never).
Ordinal responses are often modeled with probit regression.
According to probit approach there is a Gaussian latent variable \(z\sim N(\mu,\sigma)\) that may depend on
some predictors.
Latent variable \(z\) defines the
ordinal output \(y = k, \ k:\{1,
...,K\}\) through formulas
\[\begin{align*} P(y = k \mid \mu, \sigma, \theta_j) &= \phi\Big(\frac{\theta_k - \mu}{\sigma}\Big) - \phi\Big(\frac{\theta_{k-1} - \mu}{\sigma}\Big), \ &k = 2, ..., K-1 \\ P(y = k \mid \mu, \sigma, \theta_j) &= \phi\Big(\frac{\theta_1 - \mu}{\sigma}\Big), \ &k = 1 \\ P(y = k \mid \mu, \sigma, \theta_j) &= 1 - \phi\Big(\frac{\theta_{k-1} - \mu}{\sigma}\Big), \ &k = K \end{align*}\]
Consider the simplest case when latent variable is standard normal
does not have predictors, threshold parameters have uniform prior.
There is only one ordinal variable.
<-"
modelStringdata {
int<lower=2> K; // num of y classes
int<lower=0> N; // num of observations
int<lower=1,upper=K> y[N];
}
parameters {
ordered[K-1] c; // thersholds
}
model {
vector[K] theta;
for (n in 1:N) {
theta[1] = Phi(c[1]);
for (k in 2:(K-1))
theta[k] = Phi(c[k]) - Phi(c[k-1]);
theta[K] = 1 - Phi(c[K-1]);
y[n] ~ categorical(theta);
}
}
"
Note that in the description above prior is not defined.
Not defined prior is equivalent to uniform prior. Omitting definition of
prior when variables are bound results in uniform prior on [0,1]. If the
variable is unbound the prior is improper uniform.
Read the data.
The sample is just one column of ratings between 1 and 7.
Calculate frequencies of different levels of the rating.
= read.csv(file = "data/OrdinalProbitData-1grp.csv")
mdta head(mdta)
Y
1 1
2 1
3 1
4 1
5 1
6 1
table(mdta$Y)
1 2 3 4 5 6 7
58 15 12 8 4 2 1
<- prop.table(table(mdta$Y))) (frequencies
1 2 3 4 5 6 7
0.58 0.15 0.12 0.08 0.04 0.02 0.01
Compile the model DSO and run Markov Chains.
<- stan_model(model_code=modelString)
model <- sampling(model,
fit data=list(N=nrow(mdta), # num of observations
K=max(mdta$Y), # num of outcome classes
y=mdta$Y),
pars=c('c'),
iter=5000, chains = 2, cores = 2)
# analyze fitted model using shinystan
launch_shinystan(fit)
Check the chains.
pairs(fit)
<-summary(fit)$summary[,c(1,4,6,8,10)]) (fitResults
mean 2.5% 50% 97.5% Rhat
c[1] 0.1811849 -0.06299102 0.1822527 0.4300299 0.9997891
c[2] 0.6070660 0.35191827 0.6040370 0.8689709 0.9998762
c[3] 1.0480714 0.74641862 1.0463543 1.3666270 0.9998301
c[4] 1.5156082 1.16003395 1.5070613 1.9045675 0.9999335
c[5] 1.9875471 1.53136095 1.9734539 2.5301832 1.0000932
c[6] 2.6172553 1.94435433 2.5745726 3.5445753 0.9998225
lp__ -138.0311040 -142.37436655 -137.6879538 -135.5741869 0.9998792
Returned parameters are 6 thresholds separating 7 ordinal categories.
stan_ac(fit, separate_chains = T)
stan_trace(fit)
stan_dens(fit)
Compare estimated means with the frequencies of the sample
cbind(Cumulativefrequencies=head(cumsum(frequencies)),
EstimatedProbabilities=pnorm(head(fitResults[,1])))
Cumulativefrequencies EstimatedProbabilities
1 0.58 0.5718888
2 0.73 0.7280965
3 0.85 0.8526971
4 0.93 0.9351908
5 0.97 0.9765691
6 0.99 0.9955680
Plot HDI of the parameters.
plot(fit)
HDI intervals overlap which may seem counter-intuitive: thresholds have to be ordered.
Extract the chains.
<- rstan::extract(fit)
fit_ext <-fit_ext$c
fit_exthead(fit_ext)
iterations [,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.35116666 0.7537651 0.9995782 1.288744 1.567238 2.221483
[2,] 0.09469918 0.5023792 0.9997730 1.284909 1.948384 2.698532
[3,] 0.06120643 0.7224987 1.0377697 1.395147 1.999715 2.478845
[4,] 0.03202448 0.4761984 0.9188529 1.516598 1.913221 2.842915
[5,] 0.21959980 0.5235180 1.2094977 1.838333 2.382429 2.755269
[6,] 0.06633391 0.6574331 1.1090569 1.551895 1.889166 2.678495
Plot first 6 rows of the chains to see that at each step of the chain thresholds are ordered.
plot(fit_ext[1,],rep(1,6),ylim=c(1,6),xlim=c(0,3),col=1,pch=16)
points(fit_ext[2,],rep(2,6),col=2,pch=16)
points(fit_ext[3,],rep(3,6),col=3,pch=16)
points(fit_ext[4,],rep(4,6),col=4,pch=16)
points(fit_ext[5,],rep(5,6),col=5,pch=16)
points(fit_ext[6,],rep(6,6),col=6,pch=16)
Two ordinal variables may need to be compared, for example, in the following situations:
1- Two groups of participants asked to answer a questionnaire with
the same scale of ordinal categories, for example, “Strongly
Disagree”, “Disagree”, “Undecided”,
“Agree”, “Strongly Agree”. But two groups are asked
about their agreement/disagreement with two different statements about
social issues: “Left-handed people should have equal rights under the
law” – for group 1, and “Disabled people should be given equal rights
under the law”.
The assumption is that people in both groups have the same latent
variable, call it “sense of fairness”. But two
social issues have different responses formalized in mean value and
standard deviation of the latent variable with respect to the
issues.
The goal of the experiment may be analysis of contrast between the two
statements, i.e. how significant the difference is between the two
statements.
2- Two groups of participants are asked to rate two different products and select level of satisfaction from 1 (lowest) to 10 (highest). Again the scale of ordinal responses is the same. Assumption is that the latent variable of evaluation of each product, “satisfaction”, is shared by both groups of participants. But different products result in different mean values and variances of the latent variable. The goal of the study is understanding the difference between the products.
Write description for 2 nominal groups.
<-"
modelStringdata {
int<lower=2> K; // num of y classes
int<lower=0> N; // num of observations
int<lower=1> D; // num of groups
int<lower=1,upper=K> y[N]; // response
int<lower=1,upper=D> x[N]; // group index: each obeservation has its own group
}
parameters {
ordered[K-3] c_raw;
vector[D] sigma; // group sigma
vector[D] beta; // group mean
}
transformed parameters{ // renormalize vector c
vector[K-1] c;
c[1] = 1.5;
for (k in 2:(K-2))
c[k] = 1.5 + (K - 2.0)* inv_logit(c_raw[k-1]); // sigmoid
c[K-1] = K - 0.5;
}
model {
vector[K] theta;
real mu;
real sgm;
beta ~ normal((1+K)/2.0, K);
sigma ~ uniform(K/1000.0, K * 10.0);
for (n in 1:N) {
mu = beta[x[n]];
sgm = sigma[x[n]];
theta[1] = Phi((c[1] - mu)/sgm);
for (k in 2:(K-1))
theta[k] = Phi((c[k] - mu)/sgm) - Phi((c[k-1] - mu)/sgm);
theta[K] = 1 - Phi((c[K-1] - mu)/sgm);
y[n] ~ categorical(theta);
}
}"
Re-scaling of thresholds (adding a constant and dividing by a
constant) is not going to change probabilities if the same
transformation is done to the distribution.
In order to remove such degrees of freedom “pin down” the extreme values
of thresholds to 1.5 and \(K - 0.5\)
correspondingly and re-scale the rest of the thresholds between them.
This is what is done in the block “transformed parameters” above.
Create DSO.
<- stan_model(model_code=modelString) model
Read the data.
= read.csv(file = "data/OrdinalProbitData1.csv", stringsAsFactors=TRUE)
mdt head(mdt)
X Y
1 A 1
2 A 1
3 A 1
4 A 1
5 A 1
6 A 1
table(mdt)
Y
X 1 2 3 4 5
A 31 8 4 1 0
B 22 11 7 3 1
The dataset now has 2 groups A and B, showing different distributions on the 5 ordinal categories.
Run the chains.
<- sampling(model,
fit data=list(N=nrow(mdt), # num of observations
K=max(mdt$Y), # num of outcome classes
D=nlevels(mdt$X),
y=mdt$Y,
x=as.integer(mdt$X)),
pars=c('c', 'beta', 'sigma'),
iter=5000, chains = 2, cores = 2
)
Analyze fitted model
pairs(fit)
stan_ac(fit, separate_chains = T)
stan_trace(fit)
stan_dens(fit)
summary(fit)
$summary
mean se_mean sd 2.5% 25%
c[1] 1.5000000 NaN 0.0000000 1.5000000 1.5000000
c[2] 2.5687264 0.004564235 0.2495291 2.1297057 2.3935228
c[3] 3.6650824 0.005892113 0.3162772 3.0108633 3.4515403
c[4] 4.5000000 NaN 0.0000000 4.5000000 4.5000000
beta[1] 0.4810617 0.012087812 0.5324673 -0.8054914 0.2065422
beta[2] 1.4104089 0.006548890 0.3521547 0.6278093 1.1962206
sigma[1] 1.7714314 0.013797419 0.5486677 0.9494024 1.3924604
sigma[2] 1.7536222 0.007729298 0.3849482 1.1429208 1.4888102
lp__ -96.5054790 0.051380648 2.0078407 -101.2465659 -97.5325740
50% 75% 97.5% n_eff Rhat
c[1] 1.5000000 1.5000000 1.500000 NaN NaN
c[2] 2.5505579 2.7310103 3.088581 2988.867 0.9997306
c[3] 3.6867641 3.9003436 4.209861 2881.335 0.9996982
c[4] 4.5000000 4.5000000 4.500000 NaN NaN
beta[1] 0.5700466 0.8559553 1.262025 1940.397 0.9999410
beta[2] 1.4373681 1.6476200 2.032807 2891.555 1.0006034
sigma[1] 1.6861781 2.0671413 3.075729 1581.332 0.9999935
sigma[2] 1.7001947 1.9638541 2.649887 2480.416 1.0005770
lp__ -96.1183970 -95.0223348 -93.752522 1527.071 0.9999967
$c_summary
, , chains = chain:1
stats
parameter mean sd 2.5% 25% 50%
c[1] 1.5000000 0.0000000 1.5000000 1.5000000 1.5000000
c[2] 2.5656778 0.2504301 2.1163268 2.3908231 2.5493339
c[3] 3.6688164 0.3169793 3.0266797 3.4572647 3.6870800
c[4] 4.5000000 0.0000000 4.5000000 4.5000000 4.5000000
beta[1] 0.4831698 0.5340802 -0.7738274 0.1930877 0.5734193
beta[2] 1.4199808 0.3544642 0.6193359 1.2114230 1.4483583
sigma[1] 1.7703654 0.5460767 0.9637670 1.3980432 1.6967757
sigma[2] 1.7439336 0.3835729 1.1482349 1.4754805 1.6862058
lp__ -96.5209197 2.0575388 -101.4332967 -97.5533587 -96.1147975
stats
parameter 75% 97.5%
c[1] 1.500000 1.500000
c[2] 2.725942 3.076784
c[3] 3.911798 4.210600
c[4] 4.500000 4.500000
beta[1] 0.859498 1.268055
beta[2] 1.652155 2.050165
sigma[1] 2.061358 3.023260
sigma[2] 1.950170 2.643014
lp__ -95.038429 -93.752162
, , chains = chain:2
stats
parameter mean sd 2.5% 25% 50%
c[1] 1.5000000 0.0000000 1.5000000 1.5000000 1.5000000
c[2] 2.5717750 0.2486376 2.1467098 2.3946781 2.5513959
c[3] 3.6613483 0.3155928 3.0004842 3.4476953 3.6846535
c[4] 4.5000000 0.0000000 4.5000000 4.5000000 4.5000000
beta[1] 0.4789535 0.5309480 -0.8431789 0.2261896 0.5658407
beta[2] 1.4008371 0.3496387 0.6568121 1.1796308 1.4286961
sigma[1] 1.7724974 0.5513536 0.9431984 1.3861733 1.6731259
sigma[2] 1.7633108 0.3861523 1.1305726 1.4973936 1.7159548
lp__ -96.4900383 1.9571712 -100.9751571 -97.5118612 -96.1227563
stats
parameter 75% 97.5%
c[1] 1.5000000 1.500000
c[2] 2.7343485 3.099431
c[3] 3.8882334 4.209746
c[4] 4.5000000 4.500000
beta[1] 0.8554803 1.243746
beta[2] 1.6452455 2.006866
sigma[1] 2.0728024 3.100717
sigma[2] 1.9807715 2.657443
lp__ -95.0109336 -93.759982
plot(fit,pars=c("beta"))
HDIs show that even at 80% the difference between the two means is insignificant.
plot(fit,pars=c("sigma"))
Standard deviations are identical.
What would the FNP approach conclude about this data?
First, assume that the data are of metric type.
Check whether t-test is able to distinguish the means.
<- subset(mdt,X=="A")
subsetA <- subset(mdt,X=="B")
subsetB t.test(subsetA$Y,subsetB$Y)
Welch Two Sample t-test
data: subsetA$Y and subsetB$Y
t = -2.1838, df = 77.571, p-value = 0.032
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.82551722 -0.03811914
sample estimates:
mean of x mean of y
1.431818 1.863636
With any level greater than 0.032 the equality hypothesis is rejected.
In the book this is explained by non-normality of the data.
Confirm this with qq-plot.
qqnorm(subsetA$Y)
qqline(subsetA$Y)
qqnorm(subsetB$Y)
qqline(subsetB$Y)
Distributions are indeed not normal. But, probably, the main reason for rejecting the null hypothesis is discrete format of the data.
Compare the samples assuming that they have multinomial distribution. Then apply \(\chi-square\) test.
chisq.test(c(table(subsetA$Y)/length(subsetA$Y),0),table(subsetB$Y)/length(subsetB$Y))
Pearson's Chi-squared test
data: c(table(subsetA$Y)/length(subsetA$Y), 0) and table(subsetB$Y)/length(subsetB$Y)
X-squared = 20, df = 16, p-value = 0.2202
With proper method selected the hypothesis of equality of means is not rejected.
Plot row means against simulated thresholds.
<- extract(fit)
draws <- draws$c[,1]
c1 <- draws$c[,2]
c2 <- draws$c[,3]
c3 <- draws$c[,4]
c4 <- rowMeans(draws$c)
mc head(cbind(c1,c4))
c1 c4
[1,] 1.5 4.5
[2,] 1.5 4.5
[3,] 1.5 4.5
[4,] 1.5 4.5
[5,] 1.5 4.5
[6,] 1.5 4.5
head(mc)
[1] 3.096186 3.176928 2.952425 3.217452 3.061712 2.988291
plot(c1, mc, xlim = c(1,5),xlab="c")
points(c2,mc)
points(c3,mc)
points(c4,mc)
abline(v=mean(c2))
abline(v=mean(c3))
(be continued)
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hai-mn/hai-mn.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Nguyen (2022, April 4). HaiBiostat: Series 8 - Ordinal Response & Probit Model. Retrieved from https://hai-mn.github.io/posts/2022-04-04-Bayesian methods - Series 8 of 10/
BibTeX citation
@misc{nguyen2022series, author = {Nguyen, Hai}, title = {HaiBiostat: Series 8 - Ordinal Response & Probit Model}, url = {https://hai-mn.github.io/posts/2022-04-04-Bayesian methods - Series 8 of 10/}, year = {2022} }